1 research outputs found

    On the use of semantic awareness to limit overfitting in genetic programming

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsMachine learning and statistics provide powerful tools to solving problems of many different shapes. But with the algorithms searching for approximations the problem of overfitting remains present. Genetic Programming describes an algorithmic approach that is likely to produce overfitting solutions. Thus, in order to lessen the risk of overfitting and increasing the generalization ability of genetic programming the use of semantic information is assessed in different ways. A multi-objective system driving the population away from overfitting solutions based on semantic distance is presented alongside alternatives and extensions. The extensions include the use of the semantic signature to increase the amount of information available to the system, as well as the consideration to replace the validation dataset. It is on the one hand concluded that the described approaches and none of the extensions have a positive impact on the generalization ability. But on the other hand it seems that the semantics do contain enough information to appropriately discriminate between overfitting and not overfitting individuals
    corecore